Overview

Dataset statistics

Number of variables15
Number of observations6362620
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory728.1 MiB
Average record size in memory120.0 B

Variable types

Numeric9
Categorical6

Alerts

name_orig has a high cardinality: 6353307 distinct values High cardinality
name_dest has a high cardinality: 2722362 distinct values High cardinality
step is highly correlated with dayHigh correlation
amount is highly correlated with diff_org_balance and 1 other fieldsHigh correlation
oldbalance_org is highly correlated with newbalance_origHigh correlation
newbalance_orig is highly correlated with type and 1 other fieldsHigh correlation
oldbalance_dest is highly correlated with newbalance_destHigh correlation
newbalance_dest is highly correlated with oldbalance_destHigh correlation
diff_org_balance is highly correlated with amount and 1 other fieldsHigh correlation
diff_dest_balance is highly correlated with amount and 1 other fieldsHigh correlation
day is highly correlated with stepHigh correlation
type is highly correlated with newbalance_orig and 1 other fieldsHigh correlation
merchant_dest is highly correlated with typeHigh correlation
amount is highly skewed (γ1 = 30.99394948) Skewed
diff_org_balance is highly skewed (γ1 = -30.07475092) Skewed
diff_dest_balance is highly skewed (γ1 = -30.33109298) Skewed
name_orig is uniformly distributed Uniform
oldbalance_org has 2102449 (33.0%) zeros Zeros
newbalance_orig has 3609566 (56.7%) zeros Zeros
oldbalance_dest has 2704388 (42.5%) zeros Zeros
newbalance_dest has 2439433 (38.3%) zeros Zeros
diff_org_balance has 1361836 (21.4%) zeros Zeros
diff_dest_balance has 1032168 (16.2%) zeros Zeros

Reproduction

Analysis started2023-01-04 17:57:06.810018
Analysis finished2023-01-04 18:12:53.428313
Duration15 minutes and 46.62 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

step
Real number (ℝ≥0)

HIGH CORRELATION

Distinct743
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243.3972456
Minimum1
Maximum743
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:53.717487image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q1156
median239
Q3335
95-th percentile490
Maximum743
Range742
Interquartile range (IQR)179

Descriptive statistics

Standard deviation142.331971
Coefficient of variation (CV)0.5847723161
Kurtosis0.329070555
Mean243.3972456
Median Absolute Deviation (MAD)92
Skewness0.3751768885
Sum1548644183
Variance20258.38998
MonotonicityIncreasing
2023-01-04T19:12:53.921168image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1951352
 
0.8%
1849579
 
0.8%
18749083
 
0.8%
23547491
 
0.7%
30746968
 
0.7%
16346352
 
0.7%
13946054
 
0.7%
40345155
 
0.7%
4345060
 
0.7%
35544787
 
0.7%
Other values (733)5890739
92.6%
ValueCountFrequency (%)
12708
 
< 0.1%
21014
 
< 0.1%
3552
 
< 0.1%
4565
 
< 0.1%
5665
 
< 0.1%
61660
 
< 0.1%
76837
 
0.1%
821097
0.3%
937628
0.6%
1035991
0.6%
ValueCountFrequency (%)
7438
 
< 0.1%
74214
< 0.1%
74122
< 0.1%
7406
 
< 0.1%
73910
< 0.1%
73810
< 0.1%
73710
< 0.1%
73614
< 0.1%
73512
< 0.1%
7348
 
< 0.1%

type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
CASH_OUT
2237500 
PAYMENT
2151495 
CASH_IN
1399284 
TRANSFER
532909 
DEBIT
 
41432

Length

Max length8
Median length7
Mean length7.422395963
Min length5

Characters and Unicode

Total characters47225885
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAYMENT
2nd rowPAYMENT
3rd rowTRANSFER
4th rowCASH_OUT
5th rowPAYMENT

Common Values

ValueCountFrequency (%)
CASH_OUT2237500
35.2%
PAYMENT2151495
33.8%
CASH_IN1399284
22.0%
TRANSFER532909
 
8.4%
DEBIT41432
 
0.7%

Length

2023-01-04T19:12:54.130321image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-01-04T19:12:54.402161image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
cash_out2237500
35.2%
payment2151495
33.8%
cash_in1399284
22.0%
transfer532909
 
8.4%
debit41432
 
0.7%

Most occurring characters

ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter43589101
92.3%
Connector Punctuation3636784
 
7.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
Y2151495
 
4.9%
Other values (7)7425297
17.0%
Connector Punctuation
ValueCountFrequency (%)
_3636784
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin43589101
92.3%
Common3636784
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
Y2151495
 
4.9%
Other values (7)7425297
17.0%
Common
ValueCountFrequency (%)
_3636784
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII47225885
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

amount
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5316900
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179861.9035
Minimum0
Maximum92445516.64
Zeros16
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:54.849167image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2224.0995
Q113389.57
median74871.94
Q3208721.4775
95-th percentile518634.1965
Maximum92445516.64
Range92445516.64
Interquartile range (IQR)195331.9075

Descriptive statistics

Standard deviation603858.2315
Coefficient of variation (CV)3.357343715
Kurtosis1797.956705
Mean179861.9035
Median Absolute Deviation (MAD)68393.655
Skewness30.99394948
Sum1.144392945 × 1012
Variance3.646447637 × 1011
MonotonicityNot monotonic
2023-01-04T19:12:55.060734image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000003207
 
0.1%
1000088
 
< 0.1%
500079
 
< 0.1%
1500068
 
< 0.1%
50065
 
< 0.1%
10000042
 
< 0.1%
2150037
 
< 0.1%
12000029
 
< 0.1%
13500020
 
< 0.1%
016
 
< 0.1%
Other values (5316890)6358969
99.9%
ValueCountFrequency (%)
016
< 0.1%
0.011
 
< 0.1%
0.023
 
< 0.1%
0.032
 
< 0.1%
0.041
 
< 0.1%
0.061
 
< 0.1%
0.071
 
< 0.1%
0.091
 
< 0.1%
0.11
 
< 0.1%
0.112
 
< 0.1%
ValueCountFrequency (%)
92445516.641
< 0.1%
73823490.361
< 0.1%
71172480.421
< 0.1%
69886731.31
< 0.1%
69337316.271
< 0.1%
67500761.291
< 0.1%
66761272.211
< 0.1%
64234448.191
< 0.1%
63847992.581
< 0.1%
63294839.631
< 0.1%

name_orig
Categorical

HIGH CARDINALITY
UNIFORM

Distinct6353307
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C1902386530
 
3
C363736674
 
3
C545315117
 
3
C724452879
 
3
C1784010646
 
3
Other values (6353302)
6362605 

Length

Max length11
Median length11
Mean length10.48232332
Min length5

Characters and Unicode

Total characters66695040
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6344009 ?
Unique (%)99.7%

Sample

1st rowC1231006815
2nd rowC1666544295
3rd rowC1305486145
4th rowC840083671
5th rowC2048537720

Common Values

ValueCountFrequency (%)
C19023865303
 
< 0.1%
C3637366743
 
< 0.1%
C5453151173
 
< 0.1%
C7244528793
 
< 0.1%
C17840106463
 
< 0.1%
C16777950713
 
< 0.1%
C14629468543
 
< 0.1%
C19995397873
 
< 0.1%
C20985253063
 
< 0.1%
C4002990983
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Length

2023-01-04T19:12:56.116763image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c19023865303
 
< 0.1%
c20985253063
 
< 0.1%
c3637366743
 
< 0.1%
c15305449953
 
< 0.1%
c10653072913
 
< 0.1%
c20513594673
 
< 0.1%
c18325480283
 
< 0.1%
c4002990983
 
< 0.1%
c19762081143
 
< 0.1%
c19995397873
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60332420
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
Uppercase Letter
ValueCountFrequency (%)
C6362620
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common60332420
90.5%
Latin6362620
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
Latin
ValueCountFrequency (%)
C6362620
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII66695040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

oldbalance_org
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1845844
Distinct (%)29.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean833883.1041
Minimum0
Maximum59585040.37
Zeros2102449
Zeros (%)33.0%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:56.481432image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median14208
Q3107315.175
95-th percentile5823702.278
Maximum59585040.37
Range59585040.37
Interquartile range (IQR)107315.175

Descriptive statistics

Standard deviation2888242.673
Coefficient of variation (CV)3.46360618
Kurtosis32.96487854
Mean833883.1041
Median Absolute Deviation (MAD)14208
Skewness5.249136421
Sum5.305681316 × 1012
Variance8.341945738 × 1012
MonotonicityNot monotonic
2023-01-04T19:12:56.741743image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02102449
33.0%
184918
 
< 0.1%
133914
 
< 0.1%
195912
 
< 0.1%
164909
 
< 0.1%
181908
 
< 0.1%
109908
 
< 0.1%
157902
 
< 0.1%
146899
 
< 0.1%
128898
 
< 0.1%
Other values (1845834)4252003
66.8%
ValueCountFrequency (%)
02102449
33.0%
0.051
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.441
 
< 0.1%
0.671
 
< 0.1%
1370
 
< 0.1%
1.021
 
< 0.1%
1.371
 
< 0.1%
1.381
 
< 0.1%
ValueCountFrequency (%)
59585040.371
< 0.1%
57316255.051
< 0.1%
50399045.081
< 0.1%
49585040.371
< 0.1%
47316255.051
< 0.1%
45674547.891
< 0.1%
44892193.091
< 0.1%
43818855.31
< 0.1%
43686616.331
< 0.1%
42542664.271
< 0.1%

newbalance_orig
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2682586
Distinct (%)42.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean855113.6686
Minimum0
Maximum49585040.37
Zeros3609566
Zeros (%)56.7%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:57.031176image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3144258.41
95-th percentile5980262.336
Maximum49585040.37
Range49585040.37
Interquartile range (IQR)144258.41

Descriptive statistics

Standard deviation2924048.503
Coefficient of variation (CV)3.419485164
Kurtosis32.06698456
Mean855113.6686
Median Absolute Deviation (MAD)0
Skewness5.176884001
Sum5.44076333 × 1012
Variance8.550059648 × 1012
MonotonicityNot monotonic
2023-01-04T19:12:57.282505image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03609566
56.7%
5888.644
 
< 0.1%
15073.444
 
< 0.1%
51224
 
< 0.1%
36875.734
 
< 0.1%
10528.494
 
< 0.1%
904.134
 
< 0.1%
18392.514
 
< 0.1%
32926.524
 
< 0.1%
4277.694
 
< 0.1%
Other values (2682576)2753018
43.3%
ValueCountFrequency (%)
03609566
56.7%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.051
 
< 0.1%
0.121
 
< 0.1%
0.131
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.231
 
< 0.1%
0.31
 
< 0.1%
ValueCountFrequency (%)
49585040.371
< 0.1%
47316255.051
< 0.1%
43686616.331
< 0.1%
43673802.211
< 0.1%
41690842.641
< 0.1%
41432359.461
< 0.1%
40399045.081
< 0.1%
39585040.371
< 0.1%
38946233.021
< 0.1%
38939424.031
< 0.1%

name_dest
Categorical

HIGH CARDINALITY

Distinct2722362
Distinct (%)42.8%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C1286084959
 
113
C985934102
 
109
C665576141
 
105
C2083562754
 
102
C248609774
 
101
Other values (2722357)
6362090 

Length

Max length11
Median length11
Mean length10.48175201
Min length2

Characters and Unicode

Total characters66691405
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2262704 ?
Unique (%)35.6%

Sample

1st rowM1979787155
2nd rowM2044282225
3rd rowC553264065
4th rowC38997010
5th rowM1230701703

Common Values

ValueCountFrequency (%)
C1286084959113
 
< 0.1%
C985934102109
 
< 0.1%
C665576141105
 
< 0.1%
C2083562754102
 
< 0.1%
C248609774101
 
< 0.1%
C1590550415101
 
< 0.1%
C45111135199
 
< 0.1%
C178955025699
 
< 0.1%
C136076758998
 
< 0.1%
C102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Length

2023-01-04T19:12:57.828933image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1286084959113
 
< 0.1%
c985934102109
 
< 0.1%
c665576141105
 
< 0.1%
c2083562754102
 
< 0.1%
c248609774101
 
< 0.1%
c1590550415101
 
< 0.1%
c45111135199
 
< 0.1%
c178955025699
 
< 0.1%
c136076758998
 
< 0.1%
c102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60328785
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
Uppercase Letter
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring scripts

ValueCountFrequency (%)
Common60328785
90.5%
Latin6362620
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
Latin
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII66691405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

oldbalance_dest
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3614697
Distinct (%)56.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1100701.667
Minimum0
Maximum356015889.4
Zeros2704388
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:58.089670image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median132705.665
Q3943036.7075
95-th percentile5147229.713
Maximum356015889.4
Range356015889.4
Interquartile range (IQR)943036.7075

Descriptive statistics

Standard deviation3399180.113
Coefficient of variation (CV)3.088193846
Kurtosis948.6741254
Mean1100701.667
Median Absolute Deviation (MAD)132705.665
Skewness19.92175792
Sum7.003346437 × 1012
Variance1.155442544 × 1013
MonotonicityNot monotonic
2023-01-04T19:12:58.305238image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02704388
42.5%
10000000615
 
< 0.1%
20000000219
 
< 0.1%
3000000086
 
< 0.1%
4000000031
 
< 0.1%
10221
 
< 0.1%
19819
 
< 0.1%
12518
 
< 0.1%
16018
 
< 0.1%
13218
 
< 0.1%
Other values (3614687)3657187
57.5%
ValueCountFrequency (%)
02704388
42.5%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.131
 
< 0.1%
0.331
 
< 0.1%
0.371
 
< 0.1%
0.791
 
< 0.1%
17
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
ValueCountFrequency (%)
356015889.41
< 0.1%
355553416.31
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%
327852121.41
< 0.1%
327827763.41
< 0.1%

newbalance_dest
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3555499
Distinct (%)55.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224996.398
Minimum0
Maximum356179278.9
Zeros2439433
Zeros (%)38.3%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:12:58.574199image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median214661.44
Q31111909.25
95-th percentile5515715.903
Maximum356179278.9
Range356179278.9
Interquartile range (IQR)1111909.25

Descriptive statistics

Standard deviation3674128.942
Coefficient of variation (CV)2.999297751
Kurtosis862.1565079
Mean1224996.398
Median Absolute Deviation (MAD)214661.44
Skewness19.35230206
Sum7.794186583 × 1012
Variance1.349922348 × 1013
MonotonicityNot monotonic
2023-01-04T19:12:58.756364image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02439433
38.3%
1000000053
 
< 0.1%
971418.9132
 
< 0.1%
19169204.9329
 
< 0.1%
1254956.0725
 
< 0.1%
16532032.1625
 
< 0.1%
1412484.0922
 
< 0.1%
4743010.6721
 
< 0.1%
1178808.1421
 
< 0.1%
7364724.8421
 
< 0.1%
Other values (3555489)3922938
61.7%
ValueCountFrequency (%)
02439433
38.3%
0.011
 
< 0.1%
0.331
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
1.741
 
< 0.1%
2.151
 
< 0.1%
2.451
 
< 0.1%
2.711
 
< 0.1%
2.761
 
< 0.1%
ValueCountFrequency (%)
356179278.91
< 0.1%
356015889.41
< 0.1%
355553416.32
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328431698.21
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6354407 
1
 
8213

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Length

2023-01-04T19:12:58.954357image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-01-04T19:12:59.087998image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring characters

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

is_flagged_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6362604 
1
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Length

2023-01-04T19:12:59.230533image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-01-04T19:12:59.349318image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

diff_org_balance
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct907958
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-201092.08
Minimum-92445516
Maximum0
Zeros1361836
Zeros (%)21.4%
Negative5000784
Negative (%)78.6%
Memory size48.5 MiB
2023-01-04T19:12:59.513563image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-92445516
5-th percentile-700716.05
Q1-249640.25
median-68677
Q3-2954
95-th percentile0
Maximum0
Range92445516
Interquartile range (IQR)246686.25

Descriptive statistics

Standard deviation606650.429
Coefficient of variation (CV)-3.016779324
Kurtosis1753.268826
Mean-201092.08
Median Absolute Deviation (MAD)68677
Skewness-30.07475092
Sum-1.27947249 × 1012
Variance3.68024743 × 1011
MonotonicityNot monotonic
2023-01-04T19:12:59.696591image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01361836
 
21.4%
-100000001085
 
< 0.1%
-1000168
 
< 0.1%
-10000133
 
< 0.1%
-20000121
 
< 0.1%
-84108
 
< 0.1%
-1944108
 
< 0.1%
-2514105
 
< 0.1%
-1982105
 
< 0.1%
-510104
 
< 0.1%
Other values (907948)4998747
78.6%
ValueCountFrequency (%)
-924455161
< 0.1%
-738234901
< 0.1%
-711724801
< 0.1%
-698867311
< 0.1%
-693373161
< 0.1%
-675007611
< 0.1%
-667612721
< 0.1%
-642344481
< 0.1%
-638479921
< 0.1%
-632948391
< 0.1%
ValueCountFrequency (%)
01361836
21.4%
-165
 
< 0.1%
-270
 
< 0.1%
-359
 
< 0.1%
-459
 
< 0.1%
-571
 
< 0.1%
-683
 
< 0.1%
-785
 
< 0.1%
-879
 
< 0.1%
-971
 
< 0.1%

diff_dest_balance
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1197049
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-304156.2303
Minimum-184891033
Maximum12930418
Zeros1032168
Zeros (%)16.2%
Negative5273086
Negative (%)82.9%
Memory size48.5 MiB
2023-01-04T19:12:59.912199image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-184891033
5-th percentile-1056983.3
Q1-308391
median-27748
Q3-3815
95-th percentile0
Maximum12930418
Range197821451
Interquartile range (IQR)304576

Descriptive statistics

Standard deviation1362380.9
Coefficient of variation (CV)-4.479214182
Kurtosis1554.520049
Mean-304156.2303
Median Absolute Deviation (MAD)27748
Skewness-30.33109298
Sum-1.935230514 × 1012
Variance1.856081717 × 1012
MonotonicityNot monotonic
2023-01-04T19:13:00.159540image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01032168
 
16.2%
-2216168
 
< 0.1%
-1181168
 
< 0.1%
-1005166
 
< 0.1%
-1339166
 
< 0.1%
-1722166
 
< 0.1%
-2752166
 
< 0.1%
-497165
 
< 0.1%
-1797165
 
< 0.1%
-1259165
 
< 0.1%
Other values (1197039)5328957
83.8%
ValueCountFrequency (%)
-1848910331
< 0.1%
-1645596081
< 0.1%
-1476469801
< 0.1%
-1423449601
< 0.1%
-1397734621
< 0.1%
-1386746321
< 0.1%
-1357679721
< 0.1%
-1354899521
< 0.1%
-1350015221
< 0.1%
-1335225441
< 0.1%
ValueCountFrequency (%)
129304181
< 0.1%
93852091
< 0.1%
56728351
< 0.1%
53158281
< 0.1%
52633901
< 0.1%
51141611
< 0.1%
48343251
< 0.1%
47550391
< 0.1%
45658381
< 0.1%
45250181
< 0.1%

merchant_dest
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
4211125 
1
2151495 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

Length

2023-01-04T19:13:00.363383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-01-04T19:13:00.506126image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

Most occurring characters

ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04211125
66.2%
12151495
33.8%

day
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.49190679
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2023-01-04T19:13:00.641062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q17
median10
Q314
95-th percentile21
Maximum31
Range30
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.921812248
Coefficient of variation (CV)0.5644171612
Kurtosis0.3323014534
Mean10.49190679
Median Absolute Deviation (MAD)4
Skewness0.3778477393
Sum66756016
Variance35.0678603
MonotonicityIncreasing
2023-01-04T19:13:00.793028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
1574255
 
9.0%
2455238
 
7.2%
8449637
 
7.1%
6441005
 
6.9%
13428583
 
6.7%
17425766
 
6.7%
7420583
 
6.6%
9417919
 
6.6%
11417859
 
6.6%
15401282
 
6.3%
Other values (21)1930493
30.3%
ValueCountFrequency (%)
1574255
9.0%
2455238
7.2%
31070
 
< 0.1%
428240
 
0.4%
59789
 
0.2%
6441005
6.9%
7420583
6.6%
8449637
7.1%
9417919
6.6%
10392945
6.2%
ValueCountFrequency (%)
31272
 
< 0.1%
3011287
 
0.2%
2954890
0.9%
2814661
 
0.2%
278578
 
0.1%
2613885
 
0.2%
2557853
0.9%
2432709
0.5%
2351012
0.8%
2253437
0.8%

Interactions

2023-01-04T19:11:11.545263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:09.371177image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:34.185532image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:57.192200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:20.348012image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:41.926746image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:03.069700image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:24.756521image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:48.369810image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:13.834326image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:12.178413image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:37.137245image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:59.795166image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:22.968630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:44.184758image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:05.454201image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:27.728137image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:51.038732image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:16.528460image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:14.958718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:39.480923image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:02.237863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:25.551141image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:46.628475image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:07.742242image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:30.442394image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:53.715391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:18.838548image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:17.647477image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:41.854218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:04.678263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:27.720502image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:48.873528image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:10.184212image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:32.996799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:56.437865image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:21.542252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:20.971916image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:44.388306image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:07.772326image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:29.942789image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:51.045100image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:12.840727image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:35.809094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:59.461642image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:24.089762image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:23.644729image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:46.645837image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:10.335437image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:32.334673image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:53.304555image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:15.101011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:38.114235image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:02.235841image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:26.629359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:26.088308image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:48.963416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:13.194513image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:34.580298image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:56.101004image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:17.381823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:40.591034image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:04.715311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:29.061918image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:29.138126image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:51.841578image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:15.551264image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:36.813583image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:58.400084image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:19.857201image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:43.361349image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:06.991134image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:31.843045image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:31.583380image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:08:54.620890image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:17.952229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:09:39.475339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:00.796896image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:22.294956image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:10:45.997365image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-01-04T19:11:09.292325image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2023-01-04T19:13:01.007630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2023-01-04T19:13:01.279093image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-01-04T19:13:01.550323image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-01-04T19:13:01.833726image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-01-04T19:13:02.103122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-01-04T19:13:02.283958image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-01-04T19:11:38.332223image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-01-04T19:11:53.589545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

steptypeamountname_origoldbalance_orgnewbalance_origname_destoldbalance_destnewbalance_destis_fraudis_flagged_frauddiff_org_balancediff_dest_balancemerchant_destday
01PAYMENT9839.6400C1231006815170136.0000160296.3600M19797871550.00000.0000000-983911
11PAYMENT1864.2800C166654429521249.000019384.7200M20442822250.00000.0000000-186411
21TRANSFER181.0000C1305486145181.00000.0000C5532640650.00000.0000100-18101
31CASH_OUT181.0000C840083671181.00000.0000C3899701021182.00000.00001002100101
41PAYMENT11668.1400C204853772041554.000029885.8600M12307017030.00000.0000000-1166811
51PAYMENT7817.7100C9004563853860.000046042.2900M5734872740.00000.0000000-781711
61PAYMENT7107.7700C154988899183195.0000176087.2300M4080691190.00000.0000000-710711
71PAYMENT7861.6400C1912850431176087.2300168225.5900M6333263330.00000.0000000-786111
81PAYMENT4024.3600C12650129282671.00000.0000M11769321040.00000.000000-1353-402411
91DEBIT5337.7700C71241012441720.000036382.2300C19560086041898.000040348.7900000-378801

Last rows

steptypeamountname_origoldbalance_orgnewbalance_origname_destoldbalance_destnewbalance_destis_fraudis_flagged_frauddiff_org_balancediff_dest_balancemerchant_destday
6362610742TRANSFER63416.9900C77807100863416.99000.0000C18125528600.00000.0000100-63416031
6362611742CASH_OUT63416.9900C99495068463416.99000.0000C1662241365276433.1800339850.1700100-126833031
6362612743TRANSFER1258818.8200C15313014701258818.82000.0000C14709985630.00000.0000100-1258818031
6362613743CASH_OUT1258818.8200C14361187061258818.82000.0000C1240760502503464.50001762283.3300100-2517637031
6362614743TRANSFER339682.1300C2013999242339682.13000.0000C18504239040.00000.0000100-339682031
6362615743CASH_OUT339682.1300C786484425339682.13000.0000C7769192900.0000339682.1300100-679364031
6362616743TRANSFER6311409.2800C15290082456311409.28000.0000C18818418310.00000.0000100-6311409031
6362617743CASH_OUT6311409.2800C11629223336311409.28000.0000C136512589068488.84006379898.1100100-12622818031
6362618743TRANSFER850002.5200C1685995037850002.52000.0000C20803885130.00000.0000100-850002031
6362619743CASH_OUT850002.5200C1280323807850002.52000.0000C8732211896510099.11007360101.6300100-1700005031